Filtering duplicate reads from 454 pyrosequencing data
نویسندگان
چکیده
MOTIVATION Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. RESULTS With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. AVAILABILITY JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Critique: ”Filtering duplicate reads from 454 pyrosequencing”
The paper describes a novel approach for filtering duplicate reads from 454 pyrosequencing data. This problem is motivated by the need of reduce sequencing errors and artifically duplicated reads in some applications such as de-novo whole genome sequencing or metagenomics. Existing solutions are often based on nucleotide sequences, while raw flowgram values, which contain additional information...
متن کاملCorrection of sequence-dependent ambiguous bases (Ns) from the 454 pyrosequencing system
Pyrosequencing of the 16S ribosomal RNA gene (16S) has become one of the most popular methods to assess microbial diversity. Pyrosequencing reads containing ambiguous bases (Ns) are generally discarded based on the assumptions of their non-sequence-dependent formation and high error rates. However, taxonomic composition differed by removal of reads with Ns. We determined whether Ns from pyroseq...
متن کامل454 Pyrosequencing to Describe Microbial Eukaryotic Community Composition, Diversity and Relative Abundance: A Test for Marine Haptophytes
Next generation sequencing of ribosomal DNA is increasingly used to assess the diversity and structure of microbial communities. Here we test the ability of 454 pyrosequencing to detect the number of species present, and assess the relative abundance in terms of cell numbers and biomass of protists in the phylum Haptophyta. We used a mock community consisting of equal number of cells of 11 hapt...
متن کاملPrimer and platform effects on 16S rRNA tag sequencing
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meani...
متن کاملLessons learned from microsatellite development for nonmodel organisms using 454 pyrosequencing.
Microsatellites, also known as simple sequence repeats (SSRs), are among the most commonly used marker types in evolutionary and ecological studies. Next Generation Sequencing techniques such as 454 pyrosequencing allow the rapid development of microsatellite markers in nonmodel organisms. 454 pyrosequencing is a straightforward approach to develop a high number of microsatellite markers. There...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 29 شماره
صفحات -
تاریخ انتشار 2013